Multi-omics analysis reveals novel loci and a candidate regulatory gene of unsaturated fatty acids in soybean (Glycine max (L.) Merr)

Background Soybean is a major oil crop; the nutritional components of soybean oil are mainly controlled by unsaturated fatty acids (FA). Unsaturated FAs mainly include oleic acid (OA, 18:1), linoleic acid (LLA, 18:2), and linolenic acid (LNA, 18:3). The genetic architecture of unsaturated FAs in soybean seeds has not been fully elucidated, although many independent studies have been conducted. A 3 V multi-locus random single nucleotide polymorphism (SNP)-effect mixed linear model (3VmrMLM) was established to identify quantitative trait loci (QTLs) and QTL-by-environment interactions (QEIs) for complex traits. Results In this study, 194 soybean accessions with 36,981 SNPs were calculated using the 3VmrMLM model. As a result, 94 quantitative trait nucleotides (QTNs) and 19 QEIs were detected using single-environment (QTN) and multi-environment (QEI) methods. Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with a significant single-environment QTN. Conclusions For QTNs and QEIs, further haplotype analysis of candidate genes revealed that the Glyma.03G040400 and Glyma.17G236700 genes were beneficial haplotypes that may be associated with unsaturated FAs. This result provides ideas for the identification of soybean lipid-related genes and provides insights for breeding high oil soybean varieties in the future. Supplementary Information The online version contains supplementary material available at 10.1186/s13068-024-02489-2.


Background
Soybean [Glycine max (L.) Merr.], a major oil crop, is commonly used in cooking oil [1].Soybean oil is mainly composed of saturated fatty acids (FAs) and unsaturated FAs.Among them, saturated FAs include palmitic and stearic acids, and unsaturated FAs include oleic (OA), linoleic acid (LLA), and linolenic acids (LNA) [2,3].Unsaturated FA is the main component of vegetable oil, accounting for more than 80% [4].The increase in the content of OA, a monounsaturated FA, can improve oxidative stability and prevent oxidation [4].LLA and LNA are polyunsaturated FAs and are very beneficial to human health [5].However, the LLA and LNA show poor stability at a high temperature and are easily oxidized [6].Thus, an important goal of soybean breeders is to increase the OA level and reduce the LLA and LNA content [7,8].
Genome-wide association study (GWAS) mapping can identify the genetic basis of a variety of complex traits [9].To date, the single-locus GWAS method has been widely applied to mine genetic loci underlying important agronomic traits, including 100-seed weight in soybean and oil content and yield-related traits in maize [2,10,11].However, quantitative trait nucleotides (QTNs) have been detected using the single-locus GWAS method, which has limited ability to detect QTNs because quantitative traits are affected by a polygenic background [12].
Currently, the mixed linear model (MLM) method can correct population structure and family relationships and is widely used [13].Based on the MLM method, single-locus GWAS methods have been widely proposed, including EMMAX, FaST-LMM and GEMMA [12,14,15].However, single-locus GWAS methods generally require Bonferroni correction and can be affected by a polygenic background.To overcome this problem in single-locus GWAS methods, multi-locus GWAS methods have been applied, in which statistics are applied to all loci [16].These multi-locus GWAS methods mainly include FASTmrEMMA, FASTmrMLM, FarmCPU, and pLARmEB [17][18][19][20].However, these methods have a high calculation burden, and the advantages of QTNby-environment interactions (QEIs) have not been fully considered.
To address this, a new multi-locus GWAS model, the 3 V multi-locus random-SNP-effect mixed linear model (3VmrMLM), has been presented [21].This method improves the QTL detection capability and can analyze the genetic variation of complex traits.It provides a new method for the gene discovery of complex traits.
In this study, a dataset of 194 soybean accessions with 36,981 SNPs was applied [2].We analyzed the unsaturated FA content in this population of 194 soybean accessions based on the multi-locus GWAS model (3VmrMLM).Our aim was to detect significant QEIs and stable QTNs compared with the results of our previous study and other independent studies and to further identify candidate genes related to unsaturated FA content.

Phenotypic variation of three soybean unsaturated FA compositions
The distribution of unsaturated FA content (including OA, LLA, and LNA) in the 194 soybean accessions is shown in Table 1.The coefficient of variation (CV%) differed among the three years.In 2013, the unsaturated FA content had the highest CV at 51% (OA), 48% (LLA), and 52% (LNA).In 2014, the CVs of OA, LLA, and LNA were relatively consistent at 28%, 25%, and 29%, respectively.In 2015, the CV of unsaturated FAs was basically the same as in 2014 (Table 1).The heritabilities of OA, LLA, and LNA were 0.41, 0.36, and 0.35, respectively (Table 1).The above results showed that the content of unsaturated FAs was affected by the environment.
The correlation coefficient of the unsaturated FA content was calculated.As shown in Fig. 1, OA, LLA, and LNA content had a high correlation within the same year.However, the OA, LLA, and LNA content was not high between different years.In 2013, OA was positively correlated with LLA and LNA (0.92 and 0.83, respectively).In 2014, OA was positively correlated with LLA and LNA (0.84 and 0.65, respectively).In 2015, OA was positively correlated with LLA and LNA (0.79 and 0.64, respectively).These results show that unsaturated FAs affect soybean oil accumulation.

Identification of QTNs for unsaturated FA-related traits using 3VmrMLM
In this study, the unsaturated FA content was reanalyzed using the single-environment QTN model (3VmrMLM).A total of 94 significant QTNs were associated with the unsaturated FA content (LOD score ≥ 3.0).Among them, 30, 34, and 30 QTNs were associated with OA, LLA, and LNA content, respectively (Table 2).

Detection of QEIs for unsaturated FA content using 3VmrMLM with multiple environments
The unsaturated FA content was reanalyzed in 3 years (2013, 2014, and 2015) using the multiple-environment QEI model (3VmrMLM) for identifying QEIs.A total of 19 significant/suggested QEIs were identified (Table 3, Fig. 2).Three significant QEIs overlapped with the above QTNs.In these QEIs, the r 2 value was between 2.01 and 14.67, and the variance value was between 0.03 and 1.33 (Table 3).

Candidate gene prediction of significant QTNs associated with unsaturated FA in soybean
There were 1246 genes identified in the flanking genomic region of each significant QTN using the 3VmrMLM method (Additional file 1: Table S1).We further conducted the Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis.As shown in Additional file 1: Fig. S2A, 201 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, cellular processes, and organismal systems, including lipid metabolism, amino acid metabolism, energy metabolism, transport, and catabolism.The results of the above enrichment analysis showed that some candidate genes around QTN were found in different processes.
The same methods mentioned above were used to analyze candidate genes in the flanking regions of the QEIs.A total of 301 candidate genes were found in the linked regions of significant QEIs (Additional file 1: Table S2).KEGG analysis found that 53 genes were significantly enriched in metabolism, genetic information processing, environmental information processing, and organismal systems, including carbohydrate metabolism and lipid metabolism (Additional file 1: Fig. S2B).In the multiple-environment QEI model, five known SNP markers were identified.In addition, some new SNP markers, including rs6528670, rs1902760, rs4633292, rs2457629, and rs48948953, were related to FA synthesis.Moreover, some known markers were identified in the multiple-environment QEI model, including rs14264702, rs34595703, rs44492166, rs23852645, and rs26951255.Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant QTN in a single year, of which rs14264702 has been reported [22].

Transcriptomic analysis of HUFA and LUFA soybean seeds
RNA-seq analysis was conducted to reveal the transcriptional regulation of unsaturated FA metabolism in HUFA (high unsaturated fatty acid) and LUFA (low unsaturated fatty acid) soybean seeds.Three comparison groups were analyzed: a comparison group of five HUFA and five LUFA varieties (FHUFA vs. FLUFA); a comparison of 10 HUFA and 10 LUFA varieties (THUFA vs. TLUFA); and a comparison of 15 HUFA and 15 HUFA varieties (HUFA vs. LUFA) (Additional file 1: Table S3).

Metabolic profiling analysis of MHUFA and MLUFA soybean seeds
To determine the unsaturated FA regulatory network at the seed development stage, a non-targeted metabolic profiling analysis was applied.There were 15 high unsaturated FA (HUFA) and 15 low unsaturated FA (LUFA) soybean varieties applied in this study (Additional file 1: Table S3).Multiple metabolites were detected using non-targeted metabolomics, including secondary metabolites, lipids, amino acids, and flavonoids.

Differential accumulation of metabolites with MHUFA and MLUFA content
In this study, the metabolic changes of high and low unsaturated FA content in 30 soybean varieties during the R6 period were studied.In FMHUFA vs. FMLUFA, 29 DAMs were annotated into the KEGG pathway.Among them, the isoflavone pathway had the most DAMs, including Genistein, 8-C-glucosylnaringenin, genistin, and biochanin A (Additional file 1: Fig. S5A).

Co-expression analysis of candidate genes and DAM metabolites
Candidate genes and metabolite networks were analyzed.In the single-environment model, the co-expression network of 30 candidate genes and DAMs in three comparison groups was constructed.

Gene-based association and haplotype analysis of candidate genes
To further determine the relationship between candidate genes and traits, the SNPs of the candidate genes were applied for the gene-based association and haplotype analysis of the candidate genes.According to   the results of candidate gene screening based on gene expression data from qRT-PCR and transcriptomics, Glyma.03G040400 and Glyma.17G236700, as the candidate genes of QTNs and overlapping SNPs of QTNs and QEIs, were studied to understand the gene variations affecting soybean unsaturated FAs and to further determine beneficial haplotypes.Three SNPs were found in the promoter and CDS regions of Glyma.03G040400(Additional file 1: Table S4).SNP markers 39,188,954, 39,189,172, and 39,190,333 showed an association with LLA (Fig. 7A, Additional file 1: Table S4).Among the three haplotypes of Glyma.03G040400,Hap 3 and Hap 2 had a significantly higher LNA content than Hap 1 in 2013 and 2014 (Fig. 7B and C).

Discussion
Soybean is an important oil crop.However, different proportions of FAs may play an important role in soybean oil.Therefore, it is of great significance to improve the content and quality of soybean oil.The single locus method has been widely used to detect genetic variation in crops, including GLM and MLM [23,24].However, single-locus GWAS methods generally need Bonferroni correction and can be affected by a polygenic background.In this study, 194 soybean accessions were analyzed using the 3VmrMLM method (Additional file 1: Figure S1, Table 2).We identified 12, 10, and eight significant/suggested SNPs  2).In addition, we compared 3VmrMLM with a single-locus MLM method by Zhao et al.We detected 63 SNPs using the MLM method.Hence, the 3VmrMLM method detected more significant SNPs than the MLM method.Among these SNPs, four SNPs were found using the MLM and 3VmrMLM methods simultaneously, including rs4953186 rs52833743, rs35024325, and rs6481810, and the discovery of rs35024325 and rs6481810 SNPs has been reported [2,22].
Environmental changes have an important impact on the quality and yield of crops; analysis of multiple environments can increase the detection capability of SNPs.In this study, six, five, and eight QEIs were found for OA, LA, and LLA, respectively (Fig. 2, Table 3).Among these SNPs, five have been reported [22,25,26].A total of 1246 genes around the significant/suggested QTNs were predicted in this study; of them, 40 genes were involved in lipid synthesis (Additional file 1: Table S1).For example, the MYB transcription factor has been reported to affect oil accumulation [27].The OsLTP gene is involved in the transport of lipid molecules in rice [28].In this study, Glyma.03G040400(GmLTP1), located on chromosome 3, was significantly related to LNA using the GLM method based on gene-based association (Additional file 1: Table S4).In Fig. 7 Gene-based association analysis and haplotype analysis.A Gene-based association analysis of Glyma.17G236700related to linolenic content.B, C The relationship between haplotypes and linoleic content analysis of Glyma.17G236700 in 2013 and 2014, respectively.D Gene-based association analysis of Glyma.03G040400related to linoleic content.E, F The relationship between haplotypes and linolenic content analysis of Glyma.03G040400 in 2013 and 2014, respectively.* and ** indicate significance at p < 0.05 and p < 0.01, respectively addition, the GmLTP1 gene was a beneficial haplotype (Fig. 7).
A total of 301 genes around the significant/suggested QEIs were detected in this study (Additional file 1: Table S2).Three significant QEIs, namely rs4633292, rs39216169, and rs14264702, overlapped with significant single-environment QTNs.Among the overlapping SNPs, genes related to FA synthesis and seed development, such as ACBP and FTSH, were identified.ACBPs can play an important role in maintaining lipid homeostasis [29].In addition, we found that the Glyma.17G236700(ACBP) gene had a beneficial haplotype (Fig. 7).

Conclusion
The 3VmrMLM method was more comprehensive for GWAS.This method overcame the huge computational burden of traditional models.In this study, 94 QTNs and 19 QEIs were identified.Five major candidate genes were found.The gene expression data from different soybean tissues and transcriptome data were used to identify Glyma.03G040400 and Glyma.17G236700 as key candidate genes around the SNPs.The beneficial haplotypes of Glyma.03G040400 and Glyma.17G236700 may be helpful for further application in soybean breeding.

Plant materials, field trials, and phenotypic evaluation
An association panel of 194 soybean germplasm resources was planted at Harbin (162.41°E, 45.45° N) in 2013, 2014, and 2015.Field trials were conducted using single-row plots (2 m long and 0.65 m between rows) and a randomized complete block design with three replicates per experimental site.The unsaturated FA content of each sample was determined using gas chromatography (GC-14C, Shimadzu Company, Japan), according to our previous method [30].The OA, LLA, and LNA content were applied in single-environment (QTN) and multienvironment (QEI) analyses.

Genotypic data
A genotypic dataset consisting of 36,981 SNPs from 194 soybean germplasm resources was generated by Specific-Locus Amplified Fragment Sequencing (SLAF-seq), which was reported in Han et al. and Zhao et al. [2,31].The 36,981 SNPs were distributed on 20 soybean chromosomes, with minor allele frequencies > 0.04 and missing data of < 10% (Fig. 8).

GWAS
The 36,981 SNPs and unsaturated FA content of 194 soybean accessions were used for association analysis via the 3VmrMLM method in 3VmrMLM software [21].QTNs for OA, LLA, and LNA content were calculated from a single environment (3 years of phenotypic data from 2013, 2014, and 2015).QEIs for OA, LLA, and LNA content were calculated using a joint analysis of multiple environments.The threshold of significance for QTNs and QEIs was set at p = 0.05 and LOD score ≥ 3.0.

Differential expression analysis based on RNA-seq
At the R6 stage, 30 soybean varieties with a high content of three unsaturated FAs and a low content of three unsaturated FAs were collected for RNA sequencing (RNA-seq) with two biological replicates.Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA).The cDNA sequencing libraries of 18 RNA samples were constructed and sequenced, and RNA-seq data were generated using the Illumina platform.Differentially expressed genes (DEGs) were identified using the edgeR package in R software [32].The significance level was set as follows: |log2(fold change)|≥ 1.

Identification of candidate genes
The 100-kb flanking region of each identified QTN and QEI was defined to search for candidate genes according to linkage disequilibrium decay analysis, as described in Zhao et al. [2].Candidate genes for unsaturated FAs were extracted in the following steps.According to previous reports, known genes related to FA content in Arabidopsis were considered references to screen their homologous genes in the soybean genome.The new candidate genes were identified using DEGs for unsaturated FAs.

Metabolite profiling
The non-targeted metabolome was completed by Bioacme Biotechnology Co., Ltd.(Wuhan, China).Briefly, a 100 mg soybean sample was loaded into a 2-mL centrifuge tube, and 300 μL 75% methanol/water was added.The tubes were centrifuged at 12,000 rpm for 15 min at 4 °C.Metabolites were screened and identified using the Metlin database.The differential metabolites were calculated using an orthogonal partial least squaresdiscriminant analysis (OPLS-DA) model, with a variable importance in the projection (VIP) score of ≥ 1 and a |log2 (fold change)| of ≥ 1.

Haplotype analysis and gene-based association analysis of candidate genes
The SNP variation of candidate genes was analyzed based on genome sequencing data.These SNPs were located at the full length of the gene, including exons, intronic regions, and upstream and downstream of the gene.Therefore, in this study, phenotypic data, including high and low total unsaturated FA content, from 50 soybean germplasm resources were used over 3 years to conduct an association analysis.A general linear model (GLM) was used to further determine the association between the SNP variation of candidate genes and unsaturated FA content using TASSEL software [33].Significant SNP variation in candidate genes was considered when the P value was less than 0.01.

Quantitative real-time PCR (qRT-PCR)
Soybean seeds with high and low unsaturated FA content were collected at the R6 stage.Total RNA was extracted using the TRIzol method, and cDNA was generated using the ReverTra Ace qPCR RT Master Mix (TOYOBO, Osaka, Japan).Real-time quantitative PCR (qRT-PCR) was performed on an ABI 7500 fast realtime PCR platform with SYBR Green (TOYOBO, Osaka, Japan).GmACTIN4 was used as an internal control, and the primer sequences for candidate genes are listed in Additional file 1: Table S5.The L-13 soybean seed samples were used as a calibrator.The results of qRT-PCR were calculated using the 2 −ΔΔCT method [34].

Co-expression analysis
The correlation coefficient was calculated between candidate genes and DAM metabolites, and a Pearson correlation cutoff value of 0.5 was generated.Data were visualized using the Cytoscape package [35].

Statistical analysis
Statistical significance was evaluated using Student's t-test performed with SPSS 22.0 software (IBM Corp., Armonk, NY, USA)."*" and "**" represent a significance level of p < 0.05 and p < 0.01, respectively.The mean and standard deviation (mean ± SD) were calculated using the data from three biological replicates.

Fig. 1
Fig. 1 Distribution of oleic, linoleic, and linolenic traits in soybean and Pearson coefficients

Fig. 3
Fig. 3 Multivariate statistical analysis of the transcriptome data in the soybean samples.A Volcano maps in different comparison groups.B Number of differentially expressed genes.The green and yellow columns represent the numbers of genes with upregulated and downregulated expression, respectively.C Upsetplot diagram showing the overlapping DEGs in the three comparison groups

Fig. 4 Fig. 5
Fig. 4 Multivariate statistical analysis of the metabolome data in the soybean samples.A OPLS-DA model analysis.B Number of differential metabolites.The green and yellow columns represent the number of genes that were upregulated and downregulated, respectively.C Upsetplot diagram showing the overlapping DAMs in the three comparison groups

Table 1
Statistical analysis of oleic, linoleic, and linolenic acid traits

Table 2
QTNs identified for unsaturated fatty acid content using the QTN detection model

Table 3
Significant/suggested QEIs for soybean unsaturated fatty acid content in three environments detected using the QTN-by-environment detection model in 3VmrMLM

Table 4
Candidate genes are identified in the transcriptome and QTN detection model

Table 5
Candidate genes are identified in the transcriptome and QEIs detection model